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Abstract 

We reconsider the Eigen's quasi-species model for competing self-reproductive 
macromolecules in populations characterized by a single-peaked fitness land- 
scape. The use of ideas and tools borrowed from polymers theory and statis- 
tical mechanics, allows us to exactly solve the model for generic DNA lengths 
d. The mathematical shape of the quasi-species confined around the master 
sequence is perturbatively found in powers of 1/d at large d. We rigorously 
prove the existence of the error-threshold phenomenon and study the quasi- 
species formation in the general context of critical phase transitions in physics. 
No sharp transitions exist at any finite and at d — > oo the transition is of 
first order. The typical r.m.s. amplitude of a quasi-species around the mas- 
ter sequence is found to diverge algebraically with exponent z^_l = 1 at the 

transition to the delocalized phase in the limit d oo. 
PACS: 87.10.+e; 64.60.Cn 
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I. INTRODUCTION 



In recent years there has been an increasing interest of theoretical physics in respect 
to new interesting phenomena for which the general approach of statistical mechanics has 
turned out to be extremely powerful. Typical examples are represented by earthquake 
modelization |T|, forest-fire propagation models financial systems and stock markets 
dynamics 0, portfolio theory |Q and population dynamics 

In the large context of biological models of evolution, the so-called quasi-species model, 
as first introduced by Manfred Eigen has to be considered the paradigm of all systems 
describing the dynamics of competing macromolecular organisms. It mostly relies on Dar- 
winian's natural selection principle as the best suited general theory to explain the evolution 
of species and their competition for life. In general it is believed that this principle has not 
only guided species to their present level of evolution, but also acted at a molecular level in 
order to create the first living beings. The complexity of life as it is, still represents a hard 
challenge for the scientists. The natural questions arising in this context are usually: i) how 
is it possible that among the huge number of possible (stable) molecular structures, natural 
selection has chosen the ones appropriate for the appearance of life on our planet? ii) Why 
this final state is so stable and perfect despite the number of possible casual mutations that 
can occur during evolution? If we count the number of different alternative DNA sequences 
that one obtains by modifying a chain of given length, we would discover that it is so huge 
that we are necessarily forced to admit that the majority of the chemical combinations has 
never being tested by natural evolution. 

In this article we reexamine the Eigen's model in the simplest formulation, with a sharply 
peaked fitness landscape on a lattice. By means of a mapping to an equilibrium problem, 
we solve the model under very general assumptions, and we discuss the consequences of our 
results in more realistic situations. 

The remainder of this paper is organized as follows. In Sec. II and III we give a short 
survey of the quasi-species model as first formulated by Eigen and coworkers. Sections IV 
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and V are devoted to the introduction of our simplified lattice model. More specifically, 
we will show how the Eigen's equations can be mapped into the statistical mechanics of 
directed polymers in a random medium. In Sec. VI we introduce the effective transfer 
matrix associated to the system. It will be used to get some preliminary analytical results. 
Sections VII and VIII contain the basic ingredients towards a full solution of the problem: 
the dual space method and the characterization of the error-threshold phenomenon as a 
thermodynamic phase transition. Finally, in Sec. IX, we get the complete solution of the 
model after summation of the partition function associated. The critical properties at the 
error threshold are calculated. A survey of the main results and a comparison with previous 
approaches are finally summarized in Sec. X. 

II. THE QUASI-SPECIES MODEL. 

In order to look for a mathematical transcription of Darwinian theory we must first 
resume the basic statements of natural selection: i) Life came about through evolution; 
ii) Evolution is the result of mutations for thermodynamic systems out of equilibrium; iii) 
Mutations are due to noncorrect reproductions or errors during the process. 

The selective principle, sometimes called "survival of the fitness" , is actually opposed to 
coexistence among individuals. Even though the fitness landscape had strong fluctuations, 
evolutions would not proceed very far if it were based on correlations among species instead 
of competition. Without a true competition for life, evolution would have needed a much 
larger time (perhaps larger than the life of the Universe!) to explore the advantageous 
mutations among the huge number of different choices in the fitness landscape. 

Darwinian principle is nothing but a sort of deterministic process of selection of the fittest 
individuals among all others with the implicit assumption that an advantageous mutant can 
occur by chance during reproduction. This is, however, not the whole story. As demonstrated 
by Eigen and coworkers in their famous work on species evolution , some guidance principle 
towards the advantageous mutants does exist, as fitter species have more chance to appear 
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that disadvantageous ones. In Darwinian models evolution is guided towards the peaks of 
the fitness landscape, that is, even though no correlation exists between a mutation and the 
fitness of the resulting mutant, there is a tendency provided by the fact that the distribution 
of mutants is fitness dependent and (statistically) not all mutations have same probability 
to occur. 

We say that two mutants belong to the same species if at each position of the DNA chain 
the found symbol is the prevailing one. In a virus chain, 10^ single position errors can be 
present. If their probability is uniform, the wild-type sequence would be, in average, exact 
with probability of about 0.9999. In other words, at each site of a DNA chain one could find 
the same nucleotide by averaging among all the individuals of a given species with an error 
of the order ~ 10"^, even though each mutant can have its own sequence which is different 
from those one of the others! The target of the selection is therefore, not a single individual, 
but a set of mutants whose DNA chain is close, in the statistical sense above defined, to 
that of a wild-type one. 

Let us now introduce the Eigen's model. Imagine that each individual is defined by a 
DNA chain and consider all individuals having a chain of same length d. For each site of 
the chain in the primary structure, we can have k different nucleotides which appear in a 
random manner. In a DNA or RNA structure they can be of 4 different types (G, A, C, 
U) . Alternatively, to simplify the problem, we can decide to distinguish only among purines 
(R) and pyrimidines (Y); in the latter case we assume k — 2. The total number of possible 
sequences of purines and pyrimidines is given by M = 2"^, and results in a extremely big 
number of choices. A single ribosomal RNA (for which d — 120) is one on 10^^ possibilities, 
and a viral genome (typically d ~ 5000) is one among the M ~ 10^'^°'^ alternative sequences. 
For more complex species this number increases even wildly and one can appreciate the order 
of magnitude of the typical numbers involved in the system. In the statistical mechanics 
language, these systems must be represented in a discrete phase space with volume of the 
order of 10^°*'^. 

In order to mathematically define affinity among individuals and species, we need a 
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quantitative measure suitable for mathematical description. This can be achieved by intro- 
ducing the Hamming distance Dh- It is defined as follows: given two individuals /j and Ij 
each having its own sequence of length d, their Hamming distance is given by the number 
of different positions which are occupied by different basis (G,A,C or U). Two individuals 
having a smaller Dh than another couple, are also more biologically affine. 

A correct classification of mutants according to their Hamming distance requires a space 
of dimension d in which each dimension consists of k sites. Mathematically, the configuration 
space Q is a d-dimensional hypercubic lattice in which each side contains k identical sites. 
In the simplest case of only two kinds of basis, {k = 2) each site has a 1-to-l correspondence 
with binary sequences. Therefore each point of represents a given wild- type and its 
neighbors the mutants with closest biological affinity. We assume to assign to each site 
X e f2 a variable, or discrete field -2(x), giving the relative concentration of wild-types of 
kind X in the total population. 

The topological structure of Q has interesting properties. By increasing the dimension 
d, the number of different ways by which two points in fl at distance L can be connected 
increases much faster (as L!) than the number of points having that distance, whose number 
goes as 2^. This has the effect that, if d is large, an enormous amount of sites are confined 
among them with a relative small Hamming distance. Biologically this means that in the 
"genome space" fl, even small mutations (e.g. one-basis error reproductions) can yield, after 
short time, to explore a big region in the whole accessible space, of total dimension 2'^. 

Moreover, as the number of different paths is of the order L\, a given species can easily 
transform into another one by avoiding unfavorable ways (e.g. disadvantageous species). 

Finally, in the very general situation, we must assign to each site in Q a variable iden- 
tifying the fitness of that given sequence. This quantity must be a frozen variable, that 
is its value must be conserved during evolution, as it schematically represents the quality 
of reproduction of that particular DNA sequence. Prom the mathematical point of view, 
the fitness landscape is represented by a rough function and defined by quenched random 
variables. This fact has the effect of rendering the solution of the model a very hard task, 
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like in the spin glass problem 

In his simpler formulation the sequences are self-reproductive, i.e. individuals reproduce 
themselves asexually, and mutants appear through mutations of their respective parents. 
We then introduce a random variable with uniform distribution in [0, 1], the copying fidelity 
Qi. From experimental observations, the typical values of are very close to 1, that is the 
probability that a given reproduction process creates a mutant different from the original 
parent is very small. From simple combinatorics we get that the probability that successive 
consecutive mutations bring a species li to a different Ij (whose reciprocal Hamming distance 
is D) will be 



The mutation matrix Q = {Qi^i\ i,j = 1, 2, ■ ■ ■ , k"^), has elements Qij giving the probability 
of mutation between Jj and Ij. The reader should note that this approach allow for different 
single-base mutations per time step. 

Let us introduce the dynamics by considering the following hypothesis: i) Sequences 
reproduce themselves in a constant fashion and, if any individual is present with concentra- 
tion Tiiit), the rate of change of the population is given by hi(t); ii) Sequences generate by 
asexual reproduction with erroneous replication and the rate depends linearly on the relative 
concentration. 

The most general natural evolution equation for the concentrations ni{t) of the species 
Jj, will then be given by 



In the above formula we have introduced the rate matrix W which contains both diagonal 
and off-diagonal terms. Ai are autocatalytic amplification factors, that is the relative rates 
of replication of the species /j. They equally describe the fitness of the respective individuals, 
as favorable species generate a higher number of offsprings. The diagonal terms Wu, {i = 
1, • • ■ , k'^), correspond to reproduction processes involving perfect replication of sequences. 




(1) 



(2) 
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while off-diagonal terms to mutations of the original ancestor. In order to maintain the 
total population constant, one has to take into account external constraints causing the 
spontaneous death of individuals. This can be simply achieved by summing to the diagonal 
terms the decay rate Di of the species i (counting the number of deaths per unit time). Its 
inverse is the average lifetime. 

It is worth to point out that both Ai and Di are (in general) quenched variables in the 
equations. Each species i is supposed having a given fitness and decay rate, fixed by external 
condition and by genetic informations. These parameters must be considered as "frozen" 
during evolution. 



III. GUIDED EVOLUTION AND ERROR-CATASTROPHE. 

Eigen and coworkers were able to show that this simplified level of description is indeed 
well defined if the concentrations ni{t) are not too high, and the replication rates dni(t)/dt 
linearly depend on the concentrations themselves. At higher densities, the solution saturates 
and the creation of new templates happens in more complex forms (for a review see 0). 
Even taking into account these effects, the proposed model can be shown to stay valid at 
a qualitative level of description, as the system still have rates that linearly depend (in 
average) on the concentrations. There are, however, situations in which a linear model 
cannot describe the actual reproduction mechanisms. A virus can, for instance, reproduce 
in the early stages of an incoming infection at much higher rates than those described by 
Eigen's linear model. 

We are now ready to a deeper investigation of the Eigen's model. To this aim it is 
advantageous to introduce a rescaled quantity 

which represents the fractional population variable. In its complete form we should add to eq. 
(2) a term which takes into account for changes in the population caused by transport effects. 
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To this aim one usually introduces a general "flux" term (f){t) to account for any external 
restriction on the total number of individuals. We can thus write the kinetic equations as 

Xi{t) = J2 W^,Xj{t) - (j){t)Xi{t). (4) 

If one neglects (f){t), the above equation simplifies into a high-dimensional linear differential 
system whose matrix W is supposed to be diagonalizable. (This hypothesis is believed to 
be satisfied in general situations). As, moreover, W is definite positive, Probenius theorem 
applies, that is the maximum (or dominant) eigenvalue Aq is positive and nondegenerate, and 
has a corresponding positive eigenvector. It gives the net production rate of sequences in the 
stationary state, and the corresponding (positive) eigenvector {xi,X2, • ■ ■ ,xn) is associated 
to the relative concentrations of individuals in the total of the population. Formally, the full 
stationary solution is a superposition of uncoupled modes and in the limit of large times the 
evolution is associated to the eigenvector corresponding to Aq. 

It can be shown that the average eigenvalue X{t) acts as a threshold: modes corresponding 
to Aj > A(t) grow indefinitely during evolution, while modes with Aj < X{t) die out. Each 
normal mode corresponds, in the original variables Xi{t), to a set of sequences (or a "clan") 
with high biological affinity. A clan is uniquely defined by an eigenvector and its associated 
eigenvalue. It competes for selection with all other clans and the target of evolution is the 
group corresponding to Aq. If viewed in the original space, a clan is represented by a set of 
sequences distributed around the one corresponding to the largest diagonal term Wa which 
will be called master sequence (MS). The mutants of the MS arc grouped around it in such 
a way that only their averaged sequence equals that of the MS itself, which will be thought 
of as the much abundant individual in the set (though variances can be very large around 
the MS). This set is called quasi- species. 

The picture that emerges from the above considerations is that of a huge number of 
individuals transforming one into the other during evolution. After some time all individuals 
will be found to be close to a limited number of MSs, as less favorable species have already 
died out. The characteristic time necessary to reach a unique MS starting from a flat 
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distribution in the space of sequences is not infinite, despite the number of sites in the 
system. This is due, as previously pointed out, to the topological structure of Q, in which 
points very far apart can be reached in few steps and are linked each other by a tight 
network of different paths. As a consequence, a given sequence will almost certainly find a 
more favorable region in the rugged landscape by performing a walk in Q that avoids passing 
through high potential barriers where it would stay blocked for long time. 

This principle of guided evolution depends on the off-diagonal terms of the matrix W. If 
they are zero, no mutations occur and the global population is stationary. If they are too big 
respect to the diagonal terms Waa, the "diffusion" in Q is overenhanced and the stationary 
state is dominated by a random creation and annihilation of all sequences. In this situation 
the typical spatial amplitude of a quasi-species becomes of the same order of d and no MS 
can be uniquely defined. We would reach the same final state if the fitness landscape would 
be flat, i.e. Ai = const. Vi. As a consequence, we deduce that it may exist a critical value 
of the error rate qc such that if g < gc the class of sequences classified as fittest becomes so 
large that it cannot be sampled by any biological population. 

This phenomenon was indeed shown to exist for a large variety of fitness landscapes 
and it is now well accepted as an intrinsic feature of the quasi-species model. A rough 
estimate of qc (usually called error threshold) can be achieved by noting that in order for a 
given sequence /j to be competitive with other mutants, its exact replication rate Wa must 
be larger than the average production rate of the mutants Ej^t. On this basis it is possible 
to show that the conditions reads 



where Xj are the stationary relative concentrations of the mutants. Since, by definition, 
Wii = AiQo — Di and Qq = q'^ is the probability of exact replication, we find that the critical 
threshold reads 



Wu>E 




(5) 



Qo > 



Ai ~ a 



(6) 
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Hence it follows that, in order to have localization around the MS, the length of the sequences 
must not exceed the critical value 

Incr Incr , . 

ctmax = --j ~ for 1 - g < 1. (7) 

mq I — q 

Once that both q and a are fixed, we then have a severe restriction on the maximum possible 
length which allows selection to find the optimal MS. The above condition can be equally 
rewritten in terms of the autocathalitic rate as 

d 



~ e"^ (8) 



The last inequality can be expressed by saying that in order to maintain a given quasi-species 
stable around a MS one needs the corresponding selective advantage (or fitness) to exceed 
a given threshold. What is surprising is the functional dependence of this threshold on the 
length of the sequences: since typically d if the order of 10^'*^, the minimum Ai requested 
is enormous! Fortunately, this is not devastating, because of the presence of the factor 
1 — g ^ 1 at the denominator in (7). 

IV. TOWARDS A SOLVABLE MODEL OF EVOLUTION. 

A full complete solution of the Eigen's model is not achievable by analytical methods, 
and despite past extensive work |]8|-|l0l, no exact solutions are still available in the literature. 



An important result in this context was achieved by Leuthausser Q, who first showed the 
link between the quasi-species model and the statistical mechanics of lattice surface systems. 

Our goal is to introduce a simplified version of the Eigen's equations which, although 
being well suitable for analytical approach, still retains the basic fundamental features of 
the general system. In particular we will consider a model in discretized time, as in 
and, after having exactly solved the problem for generic sequence lengths rf, we will prove 
that the transition from a localized quasi-species to a random distribution of individuals 
is equivalent to a first order phase transition. The mapping is based on the observation 
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that the system admits a simple representation in terms of equihbrium statistical physics. 
Similar ideas were already introduced by ||^, in which the main idea was to map of the 
ODE (4) into a multidimensional Ising-like spin system at equilibrium. However, due to the 
complex form of the "effective" Hamiltonian resulting from the mapping, which contains 
a complicated interaction term depending on the selective advantages Ai, this approach 
allowed only for numerical solutions. Tarazona |]lOl performed, on this basis, a series of 
interesting computations with different fitness landscapes and found a rich resulting scenario. 

Our idea is to introduce a different mapping of the Eigen's equations to an equilibrium 
statistical system, which, in our opinion, is simpler and more natural than the one used in 
0. By means of this new mapping, in fact, we can directly relate eq.(4) to a well-known 
problem in statistical mechanics, that is, directed polymers in random media (DPRM) ||11|| . 
Due to the large amount of work done in this domain in the past years |]12[, a mapping to 



DPRM is important for many reasons. First of all, the physics of DPRMs has applications 
in a large variety of physical phenomena, and it would be at least interesting to compare all 
these systems with the evolutionary dynamics proposed by Eigen. On the other hand, due 
to the large amount of analytical and numerical work done in the directed polymers context, 
we have a solid background which can be used to understand, on a more rigorous basis, the 
physics behind the quasi-species model. 

In particular, in this paper, we will concentrate on the characterization of the error- 
threshold phenomenon as a phase transition, and the calculation of the critical exponents 
involved (in the simplest case we have considered). Anticipating future conclusions, the 
error-threshold transition turns out to be equivalent to a depinning phase transition of a 
directed polymer by a bulk potential |12[. For sake of completeness, in the last section, we 



will discuss our results in respect to those obtained by previous approaches. 
In order to introduce our model, we first formulate some general hypothesis. 

1. We consider sequences defined by two-state basis (e.g. Y and R); that is we take k = 2. 
Each sequence of length d is made of a combination of "0" and "1" bits and Q is the 
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unitary hypercubic lattice {0, 1}*^. 

2. The fitness landscape is fiat but one point (take the origin 0) having higher fitness. In 
other words we consider a single-peaked distribution of selective advantages, by taking 
Ai = b, ii n 3 X and Ai = a > b, ii Q. 3 x = 0. 

3. The decay rates are zero, i.e. = 0, Vi = 1, 2, • • • , fc'^. We have numerically verified 
that this assumption does not affect our final conclusions. 

4. We consider evolution in discretized time. Eigen's model is (formally) similar to a 
system of coupled master equations in the variables Xi{t) if we interpret Xi{t) as the 
"probabihty to find a locahzed quasi-species around the MS at time t" . If we imagine 
to consider the time as a multiple of a small interval (or waiting time) r, i.e. t = Nt, 
we can write that 



Usually r is simply related to the inverse of the transition probability per unit time 
in the continuous equation. The above relation shows that, apart from the identity 
operator 5ij, the dynamics on the discrete time can be described by the repeated 
application of a 2^^ x 2*^ transfer matrix Tij with i, j = 1, 2, • • • , 2"^. 

5. In general, one should take into account multiple one-basis mutations per time step 
T. This is contained in original Eigen's model as the rate matrix Wij has all non zero 
off-diagonal entries. Nevertheless, we will formulate the hypothesis that the transfer 
matrix Tij can be reduced to another matrix Tjj which allows only single-basis mutation 
per time step. The reason is that has a much simpler structure than T^, since 
almost all off-diagonal elements are zero. We will prove below that using the one-jump 
formulation of the system does not modify the physical picture that emerges from the 
model. In fact, allowing more than one mutation per time step, corresponds to take 




(9) 
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higher powers of Tij, as one can easily see. All our results can be associated, however 
(see below), to the behavior of the set of eigenvectors of the transfer matrix which 
does not depend on the power of Tij we actually take into account. 

We finally note that, without loss of generality, one can take 6=1, apart from unimpor- 
tant multiplicative factors. 

V. THE MODEL. 

Let us consider a d-dimensional hypercubic unitary lattice fl = {0, 1}"*, representing 
the configuration space. For mathematical convenience, we will assume to have periodic 
boundary conditions in all directions, even though this hypothesis is not essential to the 
physics of the problem. Each side of Q is made of only two points representing binary units. 
Each point of fl has a one-to-one correspondence to a sequence /j (i = 1, since the 
cardinality of I is equal to the number of points of Q. We formulate the implicit hypothesis 
that all individuals of the population have the same sequence length d. 

On each site x e Q we have a variable 2{:x.) corresponding to the relative concentration 
of individuals of wild- type /x- Equivalently, we can interpret Z{'x) as the probability to find 
the sequence Jx in the total of the population. At each time step a fraction t e [0, 1] of 
the population of the same wild-type reproduces incorrectly and their sequences change one 
basis among d and transform itself into a new set of individuals ly. In our usual probabilistic 
interpretation, t gives the probability that the MS transforms into ly. 

Since there are d basis for each sequence, the probability that a mutation takes place is 
dt while 1 — dt is the probability of exact replication. In other words, 1 — dt is the fraction 
of the population 2(x) which survive evolution. We need to consider pairs {d,t) such that 
dt < 1. This is not a limitation of our approach, in fact, even though d is usually very large, 
we only study conditions in which the reproduction fidelity is very high, i.e. t <^ 1. 

All sequences have the same fitness b = 1, apart from the origin = (0, 0, ■ ■ ■ , 0) having 
selective advantage a > 1. 
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It is then simple to write down a recursive relation for the relative concentrations 2^Af(x) 
at time on the basis of the above arguments: 

Z^+i(x) = (l + (a-l)5,^ij) 

X (j2 t^N (x + eW) + (1 - dt)Z^{^)^ , (10) 

where we have introduced the unitary vectors e^*-* as those having a"l" bit as ith element 
if X has a "0" in the same position and viceversa. The above equation uniquely defines the 
transfer matrix Tij as Ziv+i(x) = TZAr(x). 

The interpretation of the above relation is simple. At time + 1, the fraction of indi- 
viduals with sequence is equal to (1 — dt) times the original concentration Z^lx.) (this 
corresponds to the individuals who have not experienced any mutation), plus the fraction 
of individuals with Hamming distance equal to 1 respect to x who, after reproduction, have 
mutated to Jx- This fraction is given by tZ]\f (x + e*^*^ ) . Moreover, we have chosen the origin 



as a favored species, that is the population in x = is amplified by a factor a > 1 respect to 
all others. This hypothesis is nothing but a simple mathematical way to impose that there 
exist a single MS Iq. 

In this framework, the existence of a quasi-species characterized by a unique MS corre- 
sponding to (0, 0, ■ ■ ■ , 0) depends on its selective advantage respect to other sequences, i.e. 
on the value of a. We thus expect to find quasi-species formation around Jo if a is larger 
than a threshold ac- 

Roughly speaking, this transition can be equally interpreted in a different context. Let us 
indeed consider a directed elastic polymer (a line) wandering in Q, directed along the "time" 
axis A^, and subjected to an attractive potential located at the origin 0. If the potential is 
uniform in A^, the energy gain per time step located at the wall is ~U. If we introduce a 
vector h^*) G f2, we can use it to identify the position of the polymer at each time step i. 
The elasticity of the polymer, in a discrete geometry, is usually described by restricting the 
one-step polymer fluctuations to be smaller than a fixed threshold. In the literature this 



constraint is usually called RSOS condition, and means that 
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can be or 1 



In a continuous formulation, the polymer statistics is associated to a restricted (i.e. with 
fixed extremes) partition function (here "s" is the continuous analogue of A^) 

/•h(s)=h 

Z(h, .) = / V [h'] (11) 

Jh(0)=O 



X exp d s' {ds'h!{s')f + \/(h', s')] } 



In general, V{\\.,s) is a random potential distributed according to a given density (DPRM 
problem). In the discrete formulation, we introduce a Hamiltonian with short-range uniform 
interaction 

N 

({h}«) =Y.[J |h« - h(^-i)| - f/5h(o,o) , (12) 

i=l 

as our potential is localized at the origin and is attractive, that is ^^(h*^*^^) = —U5^{i) Q. 
The continuous partition function then becomes a sum over all possible realizations of the 



restricted polymer between and [11 



Z;v(x) =Eexp{-^;v({h}«)/r}. (13) 

{h} 

The above sum completely specifies the state of the polymer at a given temperature 
T, or equivalently, at a given potential strength U . By general considerations, we know 
that in the thermodynamic limit the polymer has a phase transition from a localized into 
a delocalized state, depending on T, or, equivalently, on U . As we will discuss below, this 
transition is perfectly defined only at ^ oo, as the cardinality of Vt is finite for every finite 
d and thermodynamic limit does not hold. 

There exists an interesting mapping between the Eigen's model and the statistical me- 
chanics of a directed polymer. For instance, in our case, a simple look at the partition 
function (13) shows that it is mathematically equivalent to the species concentration 2Ar(x) 
which identically satisfies the recursive relation (10), once we have introduced the definitions 
a = exp(f//T) and t = exp(— J/T). That is why we implicitly used the same notation for 
the concentration of individuals and the polymer partition function. 

As an example, let us suppose that for a given set {a, d, t} the polymer is in the localized 
(delocalized) phase: this can be equivalently expressed by saying that evolution brings 
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species preferentially to (apart from) the master sequence Jq. Therefore the error threshold 
transition in the self-reproductive model is reduced to the search of the critical pinning a 
necessary to localize the directed polymer for fixed values of d and t. The error catastrophe 
transition will then be perfectly understood in the general context of thermodynamic phase 
transitions. Even though we will concentrate our study on the simplest case of a single peaked 
fitness, it is worth mentioning that the same formalism applies in more realistic situations, 
for which we are forced to consider a quenched bulk potential, as in (11). Generally speaking, 
studying of the dynamics of the quasi-species model, turns out to be not a simpler problem 
than DPRM. 



VI. THE EFFECTIVE MATRIX. 

In order to calculate the partition sum (13), we must first solve a 2"^ x 2^^ eigenvalue 
problem associated to the transfer matrix T, 2jv_|_i(x) = T2Ar(x). 

As we are interested in the stationary state at — > oo, we don't need to find the whole 
spectrum of T but its spectral radius (i.e. the maximum eigenvalue) e as a function of the 
free parameters {a,d,t}, only, e the only significant contribution to the free energy density 
(per unit length) / = limTv^oo — log(7-^ — TS)/[3N. At large times the action of T is 
dominated by the spectral radius e, that is Z^^^i^s) ~ eZn^-s). 

It is worth considering some simple mathematical preliminaries which will be useful in 
the future. A straight investigation of the transfer matrix shows that it is definite positive 
and irreducible, and then, as a consequence, Perron-Frobenius theorem on finite matrices 



applies |jT3[ : the spectral radius is positive and non-degenerate, and corresponds to a positive, 
unique, eigenvector. 

Due to the high dimensionality of the system (recall that typically d ~ 10^'^ in a virus 
sequence), it is not convenient to use this form of the matrix for numerical investigation. 

To this aim, we observe that the system has a symmetry respect to any change of "1" 
and "0" bits in a given sequence. In other words, if two points x and y of f2 have the same 
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Hamming distance from the MS (0, 0, • • • , 0) they are completely equivalent. The transfer 
matrix is in fact completely invariant in this case under permutation of the two points. 

Therefore the partition function must be invariant under rotations in Q and we can 
restrict ourselves to study its radial dependence only, i.e. 2^(x) = -2(|x|) = 2{u) where we 
have defined v — D^i^, 0). It is a simple combinatorial result that the number of points of 
Q with the same Hamming distance p from the origin is given by M = d\/[{d — 

If we define a new vector as Pn{j^) — J2\x\=u ^n^x) we can equally study our eigenvalue 
problem in terms of a new transfer matrix S defined as Pjv+i(i^) = SPn{t^)- It can be found 
by observing that 

P^(i.) = (l + (a - 1)6^^^) (14) 
+ (e E^2iv(x + e») 

\|x|=i^ i=l 

+ {1-dt) E ^iv(x)| , 

where the last term in parenthesis is simply given by (1— (it)Pjv(i^). By definition, Zjv(x+e*^*'') 
is of the form Zn{x) with |x| = i/ + 1 or |x| = i/ — 1. Hence, after some algebra, we find 
that 

5: Z^(x + e«) = (z/+l) J2 ^iv(x + e») (15) 

|x|=i^ |x|=i/+l 

+ {d-u + l) Yl 2^iv(x + e«). 

|x|=i.-l 

By using this identity we can show that the recursion relation for Pn{i^) reads 

P;v+i(^) = (l + (a - 1)5^,0-) [(1 - dt)P^{v) 

+ t{v + l)Piv(i^ + l) + {d-v + l)Pjv(t/ - 1)] 

with P^(i/) = for v> d. (16) 

We can then study the system by means of an effective {d-\-l) x (rf + 1) matrix S defined 

as 
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a(l — dt) at 

dt (1 - dt) 2t 



V 



(17) 



2t (l-dt) dt 
t (1 - dt) 

It is easy to see that S and T are completely equivalent to our problem, since they have same 
spectral radius, as it turns out from very general results in group theory ||14[. In advantage, 
S is certainly more suitable for numerical diagonalization respect to T. 

What is more important, however, is that we can use the effective matrix to calculate 
some accurate upper and lower bounds for e. This is a consequence of a theorem on positive. 



irreducible matrices: it states that the spectral radius e{A) of a positive matrix A 
satisfies the inequalities 



mm 



iuj J2j dij <e{A) < maxj Y.j « 



Hiinj Si < e{A) < maxj J2i o-i 



In summary, we find that 



£(S)> 



1 



a(l - dt) 

while the upper bound is estimated as 



a < (1 - dt)-^ 
a > (1 - dt)-^ 



(18) 



a < 



l-(d-2)t 



e(S) < I 



a G 



l-dt 

l-{d-2)t l+2t 



l-dt ' l-{d-l)t 

a > d 



(19) 



a(l - dt) + dt 

l + 2t 
a{l -dt + t) 
a(l — dt) + dt 

The result is shown in Fig.(l) where the two curves corresponding to the upper e+(S) and 
lower bound £_(S) are plotted (dashed lines). From the above inequalities we immediately 
find some interesting results concerning our system. In fact we deduce that, Vrf finite, e > 1 



and that in the hmit d — > oo the spectral radius is bounded between two values, converging 
to 

e(S) ^1+ if a < 

e{S) a(l - dt) if a > y~^- (2°) 

This result indicates that a = Oc = 1/(1 — rft) is the critical value of the pinning we need 
to localize the polymer at the origin for any fixed set of parameters (T, J) . It is intuitively 
clear that, rigorously speaking, we cannot have a phase transition at finite d, since finite is 
the cardinality of fl, too. Only in the limit — > oo the polymers has a finite probability do 
completely delocalize from the defect; at any finite dimension it can wander up to a distance 
of the order of d even at A?" — > oo. Naively speaking, we can say that, at large (finite) 
dimensions, and if the pinning strength is not big enough, the polymer is "rough" in the 
sense that it can visit all accessible configuration space up to the maximum size allowed for 
that fixed d. On the other hand, in the "pinned" phase, the transversal locahzation length 
i within which the polymer is confined to the origin is independent on the linear size N 
and is always finite (even at d ^ oo). The two different behaviors take place at a given 
characteristic value Uc (or equivalently Uc) of the pinning potential. Later on we will further 
discuss this problem and its implications in the biological context. 

It is worth noting that, from simple inspection of the effective matrix, one can also get 
some information about the distribution (or concentration) of individuals in the configuration 
space. This can be easily achieved by the knowledge of the eigenvector associated to the 
spectral radius £(S). We consider the sum of its components mjv = J2t=oPN{i^), and from 
the above iterative relation for Pn{i^) have: 

d 

rriN+i = E [(« - l)<^-,o + 1] [(1 - dt)PM{y) 

i/=0 

+ t{v + l)PN{v+l)+t{d-V+l)PN{v -I)] 

= (a-l)[(l-dt)P^(0)+iPjv(l)] 

d d 

+ (1 - dt) E PN{y) + 1 + ^)PNi.^ + 1) 
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- ^^(I/-l)P^,(i/-l) + rf^X^P^(^/-l) (21) 

i/=0 1^=0 

d 

= (a - 1) [(1 - dt)PN{Q) + tPiv(l)] + E 

Apart from a constant multiplicative (normalization) factor, we find, in the thermodynamic 
limit N ^ oo, that 

£ a — 1 

m = . 22 

e-1 a ^ ' 

It is easy to prove that the inverse of m gives (apart from a constant factor) the fraction 
of the population at the origin (that is with MS equal to (0, 0, • • • , 0)). In fact a simple 
calculation shows that m^^ oc -P(O)/ P{^)- 

The dependence of m on the pinning strength a is depicted in Fig. (2) in a semilogarithmic 
scale. We see that m ~ 2^^ for a < Cc, i.e. the fraction of individuals with MS equal to is 
2"*^. In other words, the origin is not, in this situation, a privileged site, as all individuals 
are equally likely to be found in VL. In the opposite situation, at a > Oc, we see that m 
is approximatively given by 1. This means that almost all the population share the same 
sequence, the quasi-species is well defined and evolution has reached a stationary state 
around the master sequence (0, 0, • • • , 0). Remarkably, the transition appears again to occur 
at Oc = (1 — dt)"^. 



VII. DUAL SPACE APPROACH. 

The direct investigation of the effective transfer matrix has given some first insight in the 
physics of the problem, in particular with respect to the origin of the phase transition. The 
simplicity of our model fortunately allows an exact solution which is, however, non-trivial, 
due to the high-dimensionality of the system. 

To this aim we first need to simplify the transfer matrix T by means of an appropriate 
transformation. As the system is defined on = {0, 1}'^, we use a discretized transformation 
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to achieve the result. We then introduce the following dual space representation of the 
partition sum 2^(x): 

Z^,(x)= Yl i-ir'^ZNik), (23) 

k={0,l}d 

and its inverse 

^Ar(k) = ^ E (-ir'2:Ar(x). (24) 

x={0,l}rf 

The dual space is obviously identical to and the Kronecker delta is defined as 2'^(5x,o = 

A rapid inspection shows that this representation implicitly contains periodic boundary 
conditions in all directions. In the dual space the transfer matrix T reads 

Z^+i(k)=5(k)Z^(k) + ^ E s{ci)Z^{q), (25) 

q={0,l}d 

with s(q) = tEti(-l)'^' + l-dt. 

Our goal is then to solve a 2'^-dimensional eigenvalue problem for the dual transfer 
matrix T acting on the r.h.s. of the last equation. In the limit N ^ oo the system reaches 
a stationary state and in this regime T is dominated by its spectral radius s. We can then 
write that in the thermodynamic limit Zjv+i(k) = £Zjv(k) = £Z(k), and 

Let us focus, for the moment, to the computation of s, and define a new constant Q — 
X^kgQ s(k)2^(k). By multiplying both sides by s(k), and summing over k, we finally arrive 
at the equation for e: 

or 
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It is clear that any attempt to directly calculate the sum appearing in the above formula 
is a very hard task, and we are forced to rely on different approaches. First of all we note 
that, since s(k) can take values of kind 1 — 2nt (with n = 1,2, - ■ ■ ,d) in d\/{n\{d — n)!) 
different ways, we can recast the sum as follows 



1 



E 



e-l + 2nt 



(29) 



Despite the simplification, the last expression is still too hard to be exactly solved, never- 
theless it can be used to study the structure of the eigenvalues of the transfer matrix. In 
fact, the r.h.s. of the above equation has d+1 singular points in £ = 1 — 2nt, the largest of 
which is located at £ = 1. For each interval between any two singular points, (29) behaves 
as a continuous function of e and it is monotonic decreasing, then invertible. The solutions 
to the above equation are given by the intersections of this function with the horizontal line 
a/ (a — 1). There are d+1 intersections, each of them corresponding to one eigenvalue of the 
transfer matrix. As the largest singular point is located at £ = 1, we have a unique eigen- 
value larger than 1, and it corresponds to the spectral radius of T. We then concentrate, in 
what follows, on the solution of (27) with the restriction e > 1, disregarding all other roots. 
It is worth noting that in one simple case the sum can be explicitly performed. In fact if we 
take e — l + 2t, the sum reads: 



d 

E 

n=0 



V"/ 



1 2"'-^ (30) 



1+n d+1 ' 



and (29) can be solved for a giving 



2t{d+l) 

{l + 2t){2-2-'i)-2t{d+iy 

Above we have anticipated that no sharp phase transition can occur at any finite d. In 
performing the limit d — > oo we must be sure that t goes to at least linearly in 1/d in 
order to preserve the probabilistic interpretation of the system (recall that dt G [0, 1]). If, 
for instance, we suppose to approach the critical state on the manifold e — 1 + 2t, that is 
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£ — > 1 linearly in t, we have, from (31), that a ^ ac — 1/(1 — dt). This again proves that, 
at least on the above manifold, (1 — dt)~^ is the critical selective advantage separating the 
two phases. 



VIII. THE EXACT SOLUTION. 

Let us consider the eigenvalue equation (28). The idea is to introduce a new represen- 
tation to simplify the formula. The bill we have to pay in this operation is that the final 
result will be expressed in implicit integral form. 

By using a Feynman-like representation, we have 

e 1 



k={o,i}d ^ ken V i=i J 



1 /■°° 

-y^ duF{u,t,e,A), 



(32) 



with A — is — \ ^ dt) I and 



F{u, t, e,A) = Y, exp 
ken 



(33) 



By noting that the sum in the exponent easily factorizes, we get 



Eexp(-^tE(-l)'^ 
ken V^*^ i=i / 



2 cosh I — 



(34) 



and therefore, after a change of variable in the integral, the eigenvalue equation takes the 
form 



= £ r ^'{e-l+dt)u (cosh(Mt))'' du. 

a — 1 Jo 



(35) 



A few remarks are important at this point on the meaning and validity of the above expres- 
sion. It represents, for each fixed set of parameters {T,J,d), an integral implicit relation 
between e and a. Nevertheless, it is not equivalent to the original series solution (28) of the 
spectrum of T since in the above procedure we have implicitly assumed that the integral 
representation were well mathematically defined. In order to do this, we must require that 
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the integral (32) converges. This is indeed the case if and only if t(— 1)'^' < Ae, or 
equivalently, if e > 1. If e < 1 the integral diverges and no real solutions to the above 
equation can be found. As a consequence, we can use eq. (35) to calculate the spectrum of 
T corresponding to eigenvalues > 1. From the previous argument, we know that there exists 
a unique eigenvalue larger than 1, and it corresponds to the spectral radius. In conclusion, 
the unique real solution in e of (35) is the spectral radius of the transfer matrix. Moreover, 
since the integral diverges at e = 1, when the attractive potential at the origin is omitted 
(i.e. a = 1), the maximum eigenvalue must be unitary, too. Then the free energy density / 
vanishes and we attain a delocalized phase, as expected. 

The implicit integral can be expressed in terms of known mathematical functions. After 
successive integrations by parts we have that (here 6 = {e — l)/t) 

la l/ d f d — 1 
~ea-l ~ T5\ ~ 5 + 2 \ ~ 5 + " 

' ^1-^)-))). (36) 



5 + 2d-2\ 5 + 2d 



and recalling the definition of the hypergeometric series of negative argument [|15 



F-m,6;c,z =Z^ (37) 

with {a)n = a{a + 1) ■ ■ ■ (a + n — 1), we finally arrive at the result that 

-d,\-—- + \,-\. (38) 



a-1 e V 2t '2. 

We immediately deduce that (recall definition (22)) mr^ = F (— c?, 1; {e — l)/2t + 1, 1/2). 

Let us define I{d; e, t) the integral in (35). I{d; e, t) is a monotonic decreasing function of 
d. This result can be easily proved by using the integral representation of the hypergeometric 
series. Physically we are interested at the behavior of the system at large dimensions, and 
in this regime we can use a Laplace saddle-point approximation of the integral solution. 
A detailed analysis of the asymptotic development of I{d; e, t) at large d needs however 
particular attention, since we should properly take into account the condition dt < 1. This 
means that both the limits d —>■ oo and t must be performed simultaneously in such a 
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way that a — dt he constant. We are implicitly assuming that t goes to linearly in 1/d, 
but we would obtain the same final result if ^ for d ^ oo. 

Let a be equal to dt, a quantity which must be kept finite during the calculation. We 
see that I{d; e, t) can be written as /q°° du g{u) exp[d/(ii)], with 

fiu) — Incosh(ii) — u, 

a 

g{u) = -. (39) 
a 

Since for £ > 1 the maximum of g{u) is located at the extreme of integration m = 0, 
the integral can be well approximated, at large d, by expanding in a McLaurin series the 
integrand. At first order in 1/d it reads 

, , , .,,„,_ 1 



/ du g{Q) exp [d (/(O) + f{Q)u)] = —-. (40) 

JQ e — I + a 



More precisely, if we take into account higher powers and the relative error, after some more 
algebra we arrive at the approximate result: 

1 a _ 1 ^ {dtf ^ 3{d.tf 



ea-1 e-l + dt d{e - 1 + dtf d'^{e - 1 + dtf 
15{dtf 2{dtY 



1 

+ ofi). (41) 



(£ - 1 + dty (e - 1 + dtf 

If we are interested in the unique real solution of the above algebraic equation, (41) can 
be inverted for the maximum e. The final solution, up to order 0{l/d^) reads, for a e [1, oo) 

e — max < 1, a(l — dt) + -- 



d{a- 1)(1 - dt) 



d2(a-l)3(l-(dt)3) \d\ 

since we know from the above arguments (from the effective matrix), that the spectral 
radius cannot be less than 1 if a > 1. This result can be finally compared with the exact 
calculation performed by numerically finding the spectral radius of T for a given set of 
parameters {d, t, a}, and the two curves are plotted in Fig.(l). 
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In the limit d 



oo we have 



€ 



max{l, a(l — dt)} 



(43) 



a result which coincides with that obtained from the analysis we performed on the effective 
matrix S. Hence the critical selective advantage for the MS (0, 0, ■ ■ ■ , 0) to create a stable 
quasi-species around it, is Oc = (l — dt)"^. In other words, as we will clarify below, Oc defines 
the error threshold for quasi-species formation. Alternatively, one can arrive at the same 
result on the basis of the convexity property of / as a function of d, as it was showed in 
|T6| . Fig. (3) shows the critical dimension dc as a function of the pinning a for two values of 
t. The coincidence between (43) and the numerical result is remarkable. 

IX. THE STATIONARY "GROUND STATE" EIGENVECTOR. 

In order to have a full solution of our system, we still need to calculate the partition 
sum (13), or more precisely, the eigenvector corresponding to the maximum eigenvalue we 
have studied in the previous paragraph. Therefore, let us go back to the recursion relation 
(25) in the dual space. Disregarding, for the moment, the normalization condition, we have 
Z(k) = Q{a — 1)2^'^/ {e — s(k)). In the direct space, it reads 



The summation of the series appearing in the above formula can be done following the 
general procedure as before, that is, at e > 1 we have 




a — I 



1 



(44) 




(45) 



with B = e — 1 + dt and 




(46) 
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Respect to the above case, we now have an additional term in the sum over k e Jl. After 
factorization, we find that 



E(-irexpf|x:(-i)'=') 

ken \-° i=i / 

n E(-l)'"exp((-l)^|). 



(47) 



In this form the formula is still too hard to allow a simple summation, but a rapid in- 
spection shows how to simplify the problem by taking into account the symmetries of the 
system. In fact we know that the partition sum must be the same for any two points 
with equal Hamming distance from the origin. Therefore we can concentrate to study only 
the "radial" function P{u) where u is the Hamming distance from (0, 0, •••,0). In prac- 
tice this observation allows us to neglect the order in which bits "1" and "0" appear in 
(47) . What is physically important is only the number of bits of each kind which are con- 
tained in a given sequence of total length d. If there are u bits of kind "1" , that is if the 
Hamming distance of the respective sequence is u, in the product at the r.h.s of (47) will 
be present u factors of kind exp{ut/B) — exp{—ut/B) = 2smh{ut/B) and d — u of kind 
e:K.p{ut/B) + e:K.p{—ut / B) — 2 cosh.(ut / B) . Finally, as the number of ways we can arrange u 
bits "1" in the total of d bits in d\/[{d — i^)!^^!], we can write that: 



P{p) = Q{a - 1) 



(d\ 



V) 



Jo 



X {smh.{ut)y {cosh.{ut)) 



d—v 



(48) 



The constant Q can be fixed by normalization, that is, if we impose that Z{-x) be summable, 
we must require that Z]^=o-P(^) ~ 1- '^^^ calculation is easy to be performed, in 



fact El 



(d\ 



s\nh.[ut)y {cos\\[ut)Y " — ex.p{udt) and thus, after integration, we get the 



result that E^=o-f(^) — Q{(^ ~ l)/(^ ~ !)■ The normahzed solution reads 



(d\ 



V) 



r du e-(--i+'^*)" 
Jo 



X {cosh{ut)f {i8inh{ut)f . 



(49) 
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At generic d it is not possible to perform the above integral, which is convergent Ve > 1, 
but we can restrict to study the form of the solution at large dimensions. 

Since one may equally characterize the depinning phase transition in terms of U or a, we 
can study its order by considering the discontinuities of the partition sum in a. At d ^ oo 
the maximum eigenvalue is defined by eq. (43). By inserting this expression into -P(z^) we 
simply find that the partition sum is a C° function in a, that is the phase transition is of 
first order. This is also clear if one looks at the shape of ln(m) which can be considered 
a sort of "order parameter", near the critical point Oc (see Fig. (2)). Moreover, from very 



general arguments |]12[, we expect that the typical length within which the polymer is 
confined around the potential, diverges at the critical point as ~ |a — ad"^^ with a given 
characteristic exponent. In a sense, the variable u appearing in (49), can be considered a 
sort of external control parameter for the system described by (44) at equilibrium. 

In order to calculate the critical exponent we can introduce the generating function 
G{X) associated to P{iy), as 

G(A) = (e^'^) = f: P{u)e'\ (50) 



u=0 

The various momenta (m = (z/™) can be calculated from G{X) in the usual way: (m = 
(9^™^G'(A)|a=o- In order to study the behavior of we need the knowledge of the fluctuations 
of the polymer around the origin, and therefore we need the second cumulant /i2 = C2^Ci • We 
thus calculate the connected generating function r(A) = logG'(A), since Hm = d{^^T{X)\x=Q- 
From the above exact formula, we can write that 



G{X) = {e -1) du exp{-(£: - 1 + dt)u + (iln[cosh(Mt)(l + K tanh{ut))]} 
Jo 



/o 
d 



.df°°^ 
e — I)— dx exp < d 
a Jo 



ln[cosh(a;)(l + tanh(a;))] — — x 

a 



(51) 



where K = and x = ut. If we are interested in the large d behavior , the integral can be 
estimated by saddle point methods. The function at the exponent is maximum in x = if 
e > 1, and then 
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G{X) {s - I)- f 
a Jo 




e-l + a{l-K) {{e-l + a)/a- Kfd 



(52) 



Corrections to the previous formula are of the order 0{1/(P). By applying the definition of 
r(A), we finally find that 



As expected, the fluctuations around the average have a power-law divergence at the critical 
point £ — 1. Since £ goes to 1 linearly with a ^ Cc, we deduce that the critical exponent is 
i/_L = 1 at d — >• oo. 

It is also interesting to look at the shape of the partition function in u. Prom the 
biological point of view, it tells us how mutants of a given MS are distributed around it to 
form a quasi-species. If we restrict ourselves, for simplicity, to the leading term in l/d in 
eq.(52), we must inverse transform it to get the real space solution at flrst order. To simplify 
the calculation, we assume that A = i?7 is a complex number, and this allows us to write 



By analytic continuation in the complex plane, rj — z becomes a complex variable and the 
resulting integral can be calculated by means of residues theorem. The integrand has a 
simple pole at z* = — iln[l + {£ — l)/{dt)] and to apply Cauchy's lemma we must close the 
integration path in the semiplane '^{z} < 0. After having calculated the residue in z*, we 
find that Res{z*) = -ia{l + {£ - l)/a)-''-'^. Hence, 



where A/" is a normalization factor. It can be easily calculated by noting that the sum 
involves a truncated geometric series: 




(53) 




(54) 




(55) 
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u=0 ^ ^ 



-1 



+ dt I — a 



dt 



(56) 



The partition function shows an exponential decay as a function of z/. The mass gap ||T2 
is therefore given by log(l + {e — l)/a) ^ {e — l)/a, close to the phase transition. Since 
the transversal correlation length is usually defined as the inverse of the mass gap, we again 
recover the result that, at the critical point, i>± = 1. 

A more refined expression of the partition function at large d can be obtained by directly 
considering a saddle point approximation of (49). Without entering into mathematical 
details (similar to those employed in previous calculations), we see that the integral in (49) 
is dominated by the region close to m* = 0. By expanding the integrand around u* and 
integrating term by term, we finally find 



^-1) 
1 



a 



V) 



d{e — 1 + a) 



e — 1 + a 
1 



2d{e — 1 + ay d 



O 



rf2 



(57) 



In Fig. (4) we compare this approximate result (by only retaining the first term in parenthesis) 
with P^^^u) given by (55). The coincidence of the two curves is good up to ~ u, i.e. in the 
physical range (recall that by definition u < d). In fact it is possible to show that (57) is a 
monotonic increasing function of u for u ^ d, while the exact function is always decreasing. 
The minimum of the approximating function is found indeed for ~ d. More precisely, if 
we only take the first term in (57), a rapid inspection shows that it can be rewritten as 

d\ 



p(i)| 



e-1] 



a 



a 



{d - v)\ d{e-l + a) \d{e - 1 + a)_ 



e — 1 + a 



a 



(58) 



the last approximation being valid iiv <^ d. We then see that, apart from inessential factors. 
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eq. (55) and (58) give the same result only if z/ -C c?. At larger z/, (58) shows the presence 
of power-law corrections in the exponential decay of the partition sum. 



X. COMPARISON WITH PREVIOUS RESULTS AND CONCLUSIONS. 

We are now in position to compare our result with the general approach by coming back 
to the usual "quasi-species" notation. The copying fidelity in a given reproduction process 
has been defined in our model by 1 — dt, while in the original work it was indicated by q'^ 
(see also eq. (7)). Therefore, the first result of our work has been to show that the critical 
threshold for quasi-species formation is given by 

a„ = -i- = 4 = (59) 

1 — at log g 

which coincides with (8). 

Let us now discuss about similarities and differences between our mapping and the 
previous approaches. In the above citated work, Leuthausser introduced a mapping of the 
Eigen's model to a system at equilibrium. In a few words, the mapping goes as follows. Let us 
consider again eq.(4) with discretized time fc, representing "generations" of macromolecules. 
If we define the vector X(/c) = (xi(A;), a;2(fc), ■ ■ ■ , X2d), representing the set of the relative 
concentrations of the macromolecules at time k, Eigen's model can be easily rewritten as 

X(A;) = W*^X(0). (60) 

As in our case, the problem is then reduced to a linear system associated to W. This 
matrix, actually, can be though of as a transfer matrix of an equilibrium system. In fact, if 
one considers only binary sequences Jj, each of them is made of d Ising spins (ai, (J2, ■ ■ ■ , 0"^), 
and the evolution of the system can be represented in a square lattice geometry. One 
side of the lattice (made, for instance, of different rows) has a length equal to that (rf) of 
the sequences, while the other one is semi-infinite in one direction, as each column can be 
associated to the state of the system at time k. The final state, in this geometry, is therefore 
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associated to the edge properties on the lattice, which represents the state of the system 
after N generations. If each site along the binary chain is exactly copied with probability q, 
independently from other sites, the replication matrix takes the form 



W,,=A,q'^—^j' ' . (61) 

This represents a transfer matrix of a two dimensional Ising-like system with nearest neigh- 
bors interactions along the "time direction". The Hamiltonian corresponding to (61) has 
however a very complicated mathematical form 



N-l 



1=0 



/5E^M^' + ln^(^ 



+ ^ln[g(l-g)], (62) 



that, in practice, does not allow for any analytical approach. Tarazona ||T0[ numerically 
solved the system for various fitness landscapes Aj, and discussed the results in respect to 
the original quasi-species model. 

Apart from the intrinsic difficulty in solving problems described by hamiltonians of the 
kind (62), there is a subtle problem contained in this formulation. The actual state of the 
system after generations, depends only on the structure of the layer at the edge of the 
square lattice, that is on the spin configurations at the A^th column. Therefore, as one 
may expect, the error threshold transition cannot be fully understood in terms of the bulk 
properties on the square lattice, as already pointed out in [1^. We thus need the complete 
knowledge of the structure of the lattice surface, and not of the bulk, to solve the original 
Eigen's model. With the Leuthausser mapping, there is no hope to accomplish that goal, 
even for the simplest possible replication landscape. The fact that the critical properties of 
the quasi-species model are associated to surface structures is, in a sense, conserved in our 
mapping, as we have also associated the error threshold problem to the statistical mechanics 
properties of an interface-like object. 

In conclusion, we have analyzed the Eigen's model in the simplest situation characterized 
by a single-peaked fitness. The main issues of our exact solution can be summarized in three 
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main points. First, we have proved that, in the hmit of infinite sequence lengths d, the error 
threshold phenomenon is associated to a first order critical phase transition. Moreover, the 
the typical amplitude of the quasi-species around the MS diverges with exponent z/_l = 1 at 
criticality. Numerical simulations |Ty], seem however to indicate that this picture no longer 
holds for more general situations. It would be extremely interesting to use our mapping 
to investigate these other well. Finally, we have proved that the critical selective 

advantage for quasi-species formation depends exponentially on the sequence length d. 

We believe that, even in more realistic situations, in which the fitness landscape is charac- 
terized by rough fiuctuations from point to point, and with the help of the directed polymers 
theory, the present study can be extended. 
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FIGURES 

FIG. 1. The maximum eigenvalue of the transfer matrix T plotted vs. the selective advantage 
a for d = 100, t = 0.003. The full line has been obtained by numerical diagonalization of the 
transfer matrix. Circles represent the analytical result (up to order 0{l/d^), see eq.(41)). The 
dashed lines are the upper and lower bounds for e obtained from the transfer matrix (see text). 

FIG. 2. log [m(a)] is plotted vs. the selective advantage a for d = 100, t = .003. At 
Oc = (1 — dt)"^ the sharp jump indicates the presence of a depinning transition (which is well 
defined only at c? — >■ oo, as explained in the text). Numerical diagonalization: full line. Analytical 
result: circles. 

FIG. 3. The critical dimension dc plotted vs. a for two distinct values of t. Lower curve: 
t = 10~^; upper curve: t = 10~^. Full lines represent the function dc = t~^{l — l/a) (see text). 
Circles and squares: numerical data from the transfer matrix. 

FIG. 4. Comparison between the two real-space forms of the solution P{i^) at the first signi- 
ficative order in 1/d. The dashed and the full lines correspond to (55) and (58), respectively. 
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